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METHOD, SYSTEM, AND PROGRAM FOR 
RETAINING VERSIONS OF FILES 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

[0001] The present invention is related to a method, system, and program for 
retaining versions of files. 

2. Description of the Related Art 

[0002] Users may maintain copies of different versions of a file in order to allow 
the user to revert back to a previous version, such as versions between scheduled 
backups. The user may have to install a storage management application to manage 
versions of a file. Such storage management applications typically utilize customized 
graphical user interfaces (GUIs) and application program interfaces (APIs) to 
interface with the operating system to perform version management related 
operations. Users may have to undergo significant training to learn to use these 
different application programs, which are often complex, especially in enterprise 
computing environments, to manage saved versions of a document. 

SUMMARY OF THE PREFERRED EMBODIMENTS 
[0003] Provided are a method, system, and program for processing a request to 
write to a source file in a storage system. A determination is made as to whether a 
retention rule is provided for the source file. In response to determining that one 
retention rule is provided for the source file, a versioned file name is generated, 
wherein a versioned file comprises the source file at a point-in-time. A command is 
transmitted to a file system to copy the source file data to a versioned file having the 
generated versioned file name and the generated versioned file name is added to a 
retention index file. The retention index file is processed to determine whether to 
purge versioned files according to the retention rule provided for the source file. 



.2- Docket No. SJO920030049US 1 

Firm No. 0037.0052 

[0004] In further implementations, purging the versioned files comprises 
determining versioned files to purge according to the retention rule, deleting the 
determined versioned file names fi'om the retention index file, and transmitting a 
command to the file system to delete versioned files having the determined versioned 
5 file names. 

[0005] Still further, processing the retention index file to determine whether to 
purge versioned files according to the retention rule may further comprise sorting the 
versioned file names for the source file in the retention index file ordered on a 
timestamp included in the versioned file names and selecting versioned files from the 

10 sorted versioned file names to purge. 

[0006] In still further implementations, the operations of processing the request, 
determining whether one retention rule is provided, generating a new versioned file 
name, transmitting the command, adding the generated versioned file name to the 
retention index file and processing the retention index file are performed by a host 

15 system and wherein the versioned file, source file, and file system are on a remote 
storage system. In such implementations, the retention index files may be maintained 
at local storage to the host system and accessed locally by the host system to 
determine versioned files to purge according to retention rules. 
[0007] Still further, the processing of the write request and the retention rules may 

20 be performed by a program executing in a kernel of an operating system. 

BRIEF DESCRIPTION OF THE DRAWINGS 
[0008] Referring now to the drawings in which like reference numbers represent 
corresponding parts throughout: 
25 FIG. 1 illustrates a computing environment in which embodiments of the 

invention are implemented; 

FIG. 2 provides information maintained in a file retention rule in accordance 
with implementations of the invention; 
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FIG. 3 illustrates information maintained in a versioned file name used with 
implementations of the invention; 

FIGs. 4 and 5 illustrate file retention operations in accordance with 
implementations of the invention; and 
5 FIG. 6 illustrates a computing architecture that may be used to implement the 

computing environment described with respect to FIG. 1. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
[0009] In the following description, reference is made to the accompanying 
10 drawings which form a part hereof and which illustrate several embodiments of the 
present invention. It is understood that other embodiments may be utilized and 
structural and operational changes may be made without departing from the scope of 
the present invention. 

[0010] FIG. 1 illustrates a network computing environment in which embodiments 
15 of the invention may be implemented. A host system 2 includes an operating system 
4 and a file system 6 that provides an organization of files stored in a storage device. 
The file system 6 may provide a hierarchical tree-like arrangement of files, which 
may involve the use of directories and subdirectories in which the files may be stored, 
where any directory may comprise a subdirectory of another dkectory or the root 
20 directory. A file system user interface 8 provides a command line or graphical user 
interface to enable the user to explore the file system and perform file system related 
operations, such as moving a file to a specified directory, deleting a file, renaming a 
file, creating a file, etc. The file system user interface 8 may comprise a file 
management program that renders a presentation of the hierarchical arrangement of 
25 files. The file system user interface 8 may comprise a stand alone file management 
program or a file management fimction accessed through an application program. A 
local storage device 12 is accessible to the host system 2, and may comprise an 
internal hard disk drive accessible over a host system 2 bus or external storage 
attached directly to the host 2 or at a proximate distance over a network. 
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[0011] The host 2 system may communicate I/O requests over network 20 to a 
storage controller 22 directed to files in storage device 24. The storage controller 
includes an operating system 26 and file system 28 to manage files in the storage 
device 24. The storage controller 22 may comprise a server class computing device, 
5 an enterprise storage server, Network Attached Storage (NAS), etc. The storage 
device 24 may be internal to the enclosure including the storage controller 22 or in a 
separate enclosure coupled to the storage controller 22. The storage device 24 
maintains source files 30, which are the files that the host application program 9 and 
file system user interface 8 would directly update and versioned files 32, which 

10 comprise different versions of the source files 30 that are generated when the source 
files 30 are updated. The versioned files 32 may be maintained in a separate retention 
directory 34 in the storage file system 28, where there may be a separate subdirectory 
for each source file for which versions are maintained. In this way, the versioned 
files are accessible through a general file system without the need to install and leam 

15 to use a special purpose storage management program. 

[0012] The host system 2 further includes a file retention filter 10 program that 
intercepts user requests to write to an existing source file 30 from the file system user 
interface 8 or from an application program 9 which directs writes to the file system 6. 
The application program 9 may comprise any application program known in the art, 

20 e.g., a database program, word processing program, spreadsheet program, etc. In 
certain embodiments, the filter 10 executes in a kemel 5 of the operating system 4 as 
a high priority task. 

[0013] The host file system 6 would communicate I/O requests over the network 20 
to the storage file system 28 to access source files 30 stored in the storage device 24. 
25 The rules database 1 1 provides a list of one or more retention rules to apply to certain 
specified source files 30. The rules database 1 1 may be implemented in any data 
structure known in the art, such as an ASCI text file, an Extensible Markup Language 
(XML) file, or relational database. The file retention filter 10 would access the rules 
database 1 1 when filtering file operations to determine whether a retention rule 
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applies to the source file 30 being updated. In certain implementations, the rules 
database 1 1 maintains versioning rules for different files for use by the file system 6, 
so that a separate database program and interface is not needed to manage the 
different versions. 

5 [0014] The host local storage 12 maintains a local retention index directory 16 
including information on versioned files 32 stored in the remote storage device 24 
that is used by the file retention filter 10 when applying retention policies. 
[0015] The host system 2 may comprise any computing device known in the art, 
such as a server class machine, workstation, desktop computer, laptop, handheld 

10 computer, telephony device, etc. The storage device 24 may comprise any storage 
device known in the art, such one or more interconnected disk drives configured as a 
Redundant Array of Independent Disks (RAID), Just a Bunch of Disks (JBOD), 
Direct Access Storage Device (DASD), as a tape storage device, e.g., a tape library, a 
virtualization device, one or multiple storage units, or etc. The network 20 may 

15 comprise any network known in the art, e.g., Wide Area Network (WAN), Storage 
Area Network (SAN), the Internet, and Intranet, wireless network, etc. Alternatively, 
the host system 2 may connect to the storage system 24 over a bus interface. 
[0016] In implementations where the file retention filter 10 executes in the kernel 5 
of the operating system 4, the operations of the file retention filter 10 remain 

20 transparent to the user and the user is unaware of the rule based checking and file 
retention management operations the file retention filter 10 performs as an extension 
of the operating system 4. Such implementations allow for versioning at the file 
system level, so that a separate database program and interfaces are not needed to 
manage versions of the source files. Further, in certain implementations, the file 

25 retention filter 10 extension for the file system 6 may be written for different 
operating systems and file systems. In this way, the file retention filter 10 would 
perform the same functions and operate in a similar manner across file systems, 
thereby standardizing the filter operations across operating system platforms and 
providing a similar user interface to allow the user to create rules to control the 
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filtering operations regardless of the operating system and file system in which the 
user is operating. 

[0017] FIG. 2 illustrates a rule entry 50 in the rule database 1 1 . Each rule entry 
may indicate: 

file identifier 52 : a name of the file to which the retention rule applies. 
Alternatively, the file identifier may identify an application or user that 
generated the file, so that the retention policy would apply to all files 
generated by that application or user. 
Retention rule 54 : specifies one or more retention rules. 

[0018] The retention rule 54 can indicate a maximum number of versions of a 
source file 30, i.e., versioned files 32, to maintain. Alternatively, the retention rule 54 
can specify a maximum number of versioned files 32 for one source file 30 to 
maintain within a given time period, or different maximum number of versioned files 
to maintain for different time periods. For instance, a rule can specify a maximum 
number of versioned files for one source file to maintain over a specified time period, 
such as no more than three file versions per day and no file versions older than one 
day. The rule may also specify a time cut-off for versioned files, such that versioned 
files whose timestamp 64 exceeds the time cut-off are removed. 
[0019] Alternatively, the rule may specify a different number of versioned files to 
retain for different time periods, so that a set of versioned files are maintained for 
each specified time periods, independent of other time periods. For instance, the 
retention rule 54 may specify one maximum number, e.g., 5, for the past hour, 
another maximum number, e.g., 3, for the past day, another maximum number for the 
past week, e.g., 2, etc. Such a rule would cause the filter 10 to separately maintain 
five versioned files 32 for the past hour, three for the past day, two for the past week, 
etc. Such a rule may be desired because the user may want a specific version over a 
more recent period, such as the past hour, but may need only a general version over a 
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longer time period, such as a day, week, month, year, etc. In this way, a multi-time 
period retention rule satisfies such retention needs. 

[0020] The file retention filter 10 would cause the storage of versioned files 32 in 
the storage device 24, which may comprise a remote storage device, that are 
5 maintained by the storage file system 28. As discussed, the source 30 and versioned 
32 files may be maintained in the same storage device 24 or separate storage devices. 
The file retention filter 10 further maintains information on the versioned files 32 
stored in the storage device 34 in retention index files 18 that are stored in a local 
retention index directory 16 that may be quickly accessed by the host 2 because it is 

10 maintained in local storage 12, such as an attached storage device (external or 
internal) or proximate storage device in a network. The local retention index 
directory 18 maintains one ore more retention index files 16, where each index file 16 
may include the names of versioned files for one or more source files. 
[0021] FIG. 3 illustrates the format of the name of each versioned file 32, which 

15 would be recorded in the retention index file 16. The versioned file name 60 includes 
a base file name 62 component comprising the full or partial name of the source file 
being retained and a version timestamp 64 indicating the version. The version 
timestamp 64 may be a system timestamp generated by a system clock or a version 
number incremented firom a previous file version number of the most recently 

20 retained versioned file. 

[0022] FIGs. 4 and 5 illustrates operations performed by the file retention filter 10 
in response to receiving a request to write to an existing source file 30 in the storage 
device 24. With respect to FIG. 4, in response to intercepting the write request (at 
block 100), a loop is performed at blocks 102-108 for each retention rule / in the rules 

25 database 11. A determination is made (at block 104) whether the source file 30 to 
update is identified by the file identifier 52 of rule /. For instance, if the file identifier 
52 specifies a file name, then the rule / applies to the source file having the name of 
the file identifier. Alternatively, if the file identifier 52 specifies a source application 
or user that generated the update or created the file, then rule i applies to the source 
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file having the source application and/or user specified in file identifier 52. If (at 
block 104) the rule / does not apply according to the file identifier, then control 
proceeds (at block 108) back to block 102 to consider the next rule. If no rule in the 
rules database 1 1 applies, then the file retention fiUer 10 transmits (at block 1 10) the 
5 write request to the storage file system 28 in the storage controller 22 to apply the 
write to the source file 30 in the storage device 24. 

[0023] If (at block 104) the rule i does apply, then the file retention filter 10 
generates (at block 1 12) a new versioned file name by combining the base name of 
the source file 30 with a generated version time stamp 64 (FIG. 3). The version time 

10 stamp 64 may be generated based on a system clock time or may be determined by 
incrementing the timestamp for the most recent versioned file 32 for the source file. 
The file retention filter 10 then sends (at block 1 14) a command to the storage file 
system 28 to copy the source file 30 to the new versioned file name in the retention 
directory 34. After the copying of the content of the source file 30 to the new 

15 versioned file 32 completes, then the write request is transmitted (at block 116) to the 
storage file system 28 to apply the update to the source file 30. 
[0024] To manage the number of versioned files, the file retention filter 10 apphes 
(at block 1 1 8) a hash function to the name of the source file 30 to determine a 
retention index file 18 name in the local retention index directory 16 maintaining 

20 information on the versioned files for the one or more source files whose name 

hashes to the retention index file name 1 8. The hash may be applied to the full path 
name of the source file 30 or the file name only. Further, since different source file 
names may hash to the same retention index file 18 name, one retention index file 18 
may maintain information on versioned files, i.e., versioned file names, for different 

25 source files 30. If (at block 119) there is no file in the local retention index directory 
16 having the determined index file name, i.e., there are no versioned instances of the 
source file whose name hashes to that determined mdex file name, then the file 
retention filter 10 generates (at block 120) a new retention index file 18 in the local 
retention index directory 16 having the determined retention mdex file name. If (at 
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block 1 19) there is one retention index file 18 having the determined retention index 
file name or one was added (at block 120), then the generate versioned file name is 
added (at block 122) to the retention index file 18 having the determined index file 
name in the local retention index directory 16. In this way, information on the 
5 versioned files 32 for the source files 30 is maintained in the retention index files in 
local storage, where local storage may comprise a relatively fast access storage, such 
as an internal hard disk drive, external storage attached directly to the host via a bus 
interface, or a proximate network storage device. 

[0025] After adding the name of the new versioned file to the retention index file 1 8 

10 for the source file, i.e., hashing to the index file name, a determination must be made 
if versioned files 32 for the source file 30 need to be purged. The file retention filter 
10 (at block 124) sorts the versioned file names having the base name 62 (FIG. 3) of 
the source file in the determined retention index file 18 according to an order based 
on the timestamp portion 64 of the versioned file. Control then proceeds to block 126 

15 in FIG. 5 to determine whether a retention policy rule indicates that versioned files 
need to be purged. If (at block 126) the retention rule / is a maximum number based 
rule, i.e., versioned files for a source must be purged if they exceed a maximum 
number, and if (at block 128) the number of sorted versioned file names exceeds the 
maximum number, then the file retention filter 10 determines (at block 130) fi-om the 

20 retention index file 1 8 one or more of the oldest versioned files based on the sort 

order on the timestamp 64 that must be purged to meet the maximum number limit on 
versioned files. A conmiand is then issued (at block 1 32) to the storage file system 
28 to delete the versioned files 32 having the determined versioned file names fi-om 
the storage device 34. Further, the determined old version file names are deleted (at 

25 block 134) fi-om the retention index file 18 for the source file, so that the purging is 
reflected in the local retention index file 18 for the source file 30. In this way, the 
local retention index directory 16 and retention index files therein are used to allow 
the host system 2 to quickly determine versioned files that need to be removed 
without having to scan files at the remote storage device 30. Such remote scanning 
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can have significant latency depending on network 20 traffic and the load on the 

storage controller 22. 

[0026] If (at block 136) the rule is age based, then the file retention filter 10 
determines (at block 138) versioned file names that exceed the age rule based on the 
5 timestamp portion of the file name (if any) and then proceeds to block 132 to issue a 
command to delete the determined versioned file names fi-om the retention index file 
18 and the actual corresponding versioned files 32 in the retention directory 34 in the 
storage device 24 to remove those versioned files 32 whose timestamp exceeds the 
age rule. 

1 0 [0027] If (at block 140) the retention rule specifies a maximum number of 

versioned files for a specified time period, then the file retention filter 10 determines 
(at block 142) the versioned file names of the sorted names that fall out of the 
specified time period based on the version timestamp 64 portion of the sorted 
versioned file names. Control then proceeds (at block 144) to block 1 32 to delete all 

15 the determined versioned file names falling outside of the specified time period (if 
there are any) fi"om the retenton index file 18 and delete the actual versioned files 32 
having the determined names from the storage device 14. The file retention filter 10 
fiirther determines (at block 146) the sorted versioned file names that fall within the 
specified time period. If (at block 148) the number of determined versioned file 

20 names that fall within the specified time period exceed the specified maximum 

number specified in the retention rule i, then the file retention filter 10 determines (at 
block 150) versioned file names that fall within the time period to purge to satisfy the 
maximum number limit for the time period according to a selection criteria. The 
selection criteria for the rule / may specify to purge the oldest versioned files based 

25 on the timestamp 64, or delete certain files within the specified time period so the 
files remaining within the time period have timestamps 64 distributed throughout the 
time period. Control then proceeds (at block 152) to block 132 to delete the 
determined versioned file names within the time period and the corresponding 
versioned files 32 in the storage device 24. As discussed, certain retention rules may 
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separately maintain versioned files for different time periods. In such case, the file 
retention filter 10 would consider the versioned files for each time period to 
determine whether versioned files for a specific time period need to be purged. 
[0028] Any aUemative retention rules would be applied (at block 154) to determine 
5 whether to delete versioned file names fi-om the retention index file 1 8 and the 
corresponding versioned files fi-om the storage device 24. Further, if the number of 
versioned file names in the retention file index 18 for the source file 30 do not exceed 
the number and/or age limits, then no purging would be performed with respect to the 
retention file index 18. 

1 0 [0029] The described implementations provide techniques to allow file retention 
policies to be implemented at a local host system with respect to source files and the 
versioned files of the source files that are stored on a remote computer. Further, in 
certain implementations, the file retention management operations are implemented 
as an extension of the file system. The file retention filter maintains a database of 

15 rules and versioned files using local file system constructs, thereby, in certain 

implementations, avoiding the need to install and use a separate database application 
program and interfaces to manage and maintain versioned files. 

Additional Implementation Details 
20 [0030] The file retention operations described herein may be implemented as a 
method, apparatus or article of manufacture using standard progranmiing and/or 
engineering techniques to produce software, firmware, hardware, or any combination 
thereof The term "article of manufacture" as used herein refers to code or logic 
implemented in hardware logic (e.g., an integrated circuit chip, Programmable Gate 
25 Array (PGA), Application Specific Integrated Circuit (ASIC), etc.) or a computer 
readable medium, such as magnetic storage medium (e.g., hard disk drives, floppy 
disks,, tape, etc.), optical storage (CD-ROMs, optical disks, etc.), volatile and non- 
volatile memory devices (e.g., EEPROMs, ROMs, PROMs, RAMs, DRAMs, 
SRAMs, firmware, programmable logic, etc.). Code in the computer readable 
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medium is accessed and executed by a processor. The code in which preferred 
embodiments are implemented may further be accessible through a transmission 
media or from a file server over a network. In such cases, the article of manufacture 
in which the code is implemented may comprise a transmission media, such as a 
5 network transmission line, wireless transmission media, signals propagating through 
space, radio waves, infrared signals, etc. Thus, the "article of manufacture" may 
comprise the medium in which the code is embodied. Additionally, the "article of 
manufacture" may comprise a combination of hardware and software components in 
which the code is embodied, processed, and executed. Of course, those skilled in 

10 the art will recognize that many modifications may be made to this configuration 
without departing from the scope of the present invention, and that the article of 
manufacture may comprise any information bearing medium known in the art. 
[0031] In described implementations, the file retention rules are defined in a rule 
database. In alternative implementations, the file retention rules may be defined with 

15 attributes associated with a file or directory, so that the rule applies to the file or all 
files in a directory. For instance, the user may associate a file retention rule with the 
attributes defined for a directory of the file system. In certain operating systems, 
such as the MICROSOFT WINDOWS operating system, the attributes that may be 
assigned to a directory are accessed by right clicking a mouse button over the name 

20 of the directory displayed in a user interface window to display a menu, and then 
selecting the properties option displayed in the menu. (Microsoft and Windows are 
registered trademarks of Microsoft Corporation). 

[0032] In certain described implementations, the file retention filter 10 is shown as 
a separate program component. The file retention filter 10 may be installed 
25 separately from the file system 6, such as a separately installed application program 
that runs when the operating system 4 and file system 6 are initialized and screens 
files the user is attempting to modify or move. Alternatively, the functionality of the 
file filter may be incorporated directly into the operating system and be made 
available as a feature of the file system installed with the operating system. 
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[0033] In described implementation, the rules database 1 1 is implemented in a file 
and information on versioned files is maintained in files in the file system. In 
alternative implementations, the file system may issue function calls to a separate 
installed application program, such as a database program, to determine information 
5 on versioned files, where such separately installed application program would 
maintain information on versioned files. 

[0034] FIGs. 4 and 5 describe specific operations occurring in a particular order. In 
alternative implementations, certain operations may be performed in a different order, 
modified or removed. Morever, steps may be added to the above described logic and 
10 still conform to the described implementations. Further, operations described herein 
may occvu" sequentially or certain operations may be processed in parallel. Yet 
fiirther, operations may be performed by a single processing unit or by distributed 
processing units. 

[0035] FIG. 6 illustrates one implementation of a computer architecture 200 of the 
15 host system 2 shown in FIG. 1 . The architecture 200 may include a processor 202 
(e.g., a microprocessor), a memory 204 (e.g., a volatile memory device), and storage 
206 (e.g., a non- volatile storage, such as magnetic disk drives, optical disk drives, a 
tape drive, etc.). The storage 206 may comprise an intemal storage device or an 
attached or network accessible storage. Programs in the storage 206 are loaded into 
20 the memory 204 and executed by the processor 202 in a manner known in the art. 
The architecture fiirther includes a network card 208 to enable communication with a 
network. An input device 2 10 is used to provide user input to the processor 202, and 
may include a keyboard, mouse, pen-stylus, microphone, touch sensitive display 
screen, or any other activation or input mechanism known in the art. An output 
25 device 212 is capable of rendering information transmitted fi-om the processor 202, or 
other component, such as a display monitor, printer, storage, etc. 
[0036] The foregoing description of the implementations has been presented for the 
purposes of illustration and description. It is not intended to be exhaustive or to limit 
the invention to the precise form disclosed. Many modifications and variations are 
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possible in light of the above teaching. It is intended that the scope of the invention 
be limited not by this detailed description, but rather by the claims appended hereto. 
The above specification, examples and data provide a complete description of the 
manufacture and use of the composition of the invention. Since many 
5 implementations of the invention can be made without departing from the spirit and 
scope of the invention, the invention resides in the claims hereinafter appended. 



